skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Li, Qiwei"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Background The United States has experienced high surge in COVID-19 cases since the dawn of 2020. Identifying the types of diagnoses that pose a risk in leading COVID-19 death casualties will enable our community to obtain a better perspective in identifying the most vulnerable populations and enable these populations to implement better precautionary measures. Objective To identify demographic factors and health diagnosis codes that pose a high or a low risk to COVID-19 death from individual health record data sourced from the United States. Methods We used logistic regression models to analyze the top 500 health diagnosis codes and demographics that have been identified as being associated with COVID-19 death. Results Among 223,286 patients tested positive at least once, 218,831 (98%) patients were alive and 4,455 (2%) patients died during the duration of the study period. Through our logistic regression analysis, four demographic characteristics of patients; age, gender, race and region, were deemed to be associated with COVID-19 mortality. Patients from the West region of the United States: Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming had the highest odds ratio of COVID-19 mortality across the United States. In terms of diagnoses, Complications mainly related to pregnancy (Adjusted Odds Ratio, OR:2.95; 95% Confidence Interval, CI:1.4 - 6.23) hold the highest odds ratio in influencing COVID-19 death followed by Other diseases of the respiratory system (OR:2.0; CI:1.84 – 2.18), Renal failure (OR:1.76; CI:1.61 – 1.93), Influenza and pneumonia (OR:1.53; CI:1.41 – 1.67), Other bacterial diseases (OR:1.45; CI:1.31 – 1.61), Coagulation defects, purpura and other hemorrhagic conditions(OR:1.37; CI:1.22 – 1.54), Injuries to the head (OR:1.27; CI:1.1 - 1.46), Mood [affective] disorders (OR:1.24; CI:1.12 – 1.36), Aplastic and other anemias (OR:1.22; CI:1.12 – 1.34), Chronic obstructive pulmonary disease and allied conditions (OR:1.18; CI:1.06 – 1.32), Other forms of heart disease (OR:1.18; CI:1.09 – 1.28), Infections of the skin and subcutaneous tissue (OR: 1.15; CI:1.04 – 1.27), Diabetes mellitus (OR:1.14; CI:1.03 – 1.26), and Other diseases of the urinary system (OR:1.12; CI:1.03 – 1.21). Conclusion We found demographic factors and medical conditions, including some novel ones which are associated with COVID-19 death. These findings can be used for clinical and public awareness and for future research purposes. 
    more » « less
    Free, publicly-accessible full text available March 31, 2026
  2. ABSTRACT The abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases. 
    more » « less
  3. Abstract The field of spatially resolved transcriptomics (SRT) has greatly advanced our understanding of cellular microenvironments by integrating spatial information with molecular data collected from multiple tissue sections or individuals. However, methods for multi-sample spatial clustering are lacking, and existing methods primarily rely on molecular information alone. This paper introduces BayeSMART, a Bayesian statistical method designed to identify spatial domains across multiple samples. BayeSMART leverages artificial intelligence (AI)-reconstructed single-cell level information from the paired histology images of multi-sample SRT datasets while simultaneously considering the spatial context of gene expression. The AI integration enables BayeSMART to effectively interpret the spatial domains. We conducted case studies using four datasets from various tissue types and SRT platforms, and compared BayeSMART with alternative multi-sample spatial clustering approaches and a number of state-of-the-art methods for single-sample SRT analysis, demonstrating that it surpasses existing methods in terms of clustering accuracy, interpretability, and computational efficiency. BayeSMART offers new insights into the spatial organization of cells in multi-sample SRT data. 
    more » « less
  4. ABSTRACT Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications. 
    more » « less
  5. Recent technology breakthroughs in spatially resolved transcriptomics (SRT) have enabled the comprehensive molecular characterization of cells whilst preserving their spatial and gene expression contexts. One of the fundamental questions in analyzing SRT data is the identification of spatially variable genes whose expressions display spatially correlated patterns. Existing approaches are built upon either the Gaussian process-based model, which relies onad hockernels, or the energy-based Ising model, which requires gene expression to be measured on a lattice grid. To overcome these potential limitations, we developed a generalized energy-based framework to model gene expression measured from imaging-based SRT platforms, accommodating the irregular spatial distribution of measured cells. Our Bayesian model applies a zero-inflated negative binomial mixture model to dichotomize the raw count data, reducing noise. Additionally, we incorporate a geostatistical mark interaction model with a generalized energy function, where the interaction parameter is used to identify the spatial pattern. Auxiliary variable MCMC algorithms were employed to sample from the posterior distribution with an intractable normalizing constant. We demonstrated the strength of our method on both simulated and real data. Our simulation study showed that our method captured various spatial patterns with high accuracy; moreover, analysis of a seqFISH dataset and a STARmap dataset established that our proposed method is able to identify genes with novel and strong spatial patterns. 
    more » « less
  6. Abstract Current clustering analysis of spatial transcriptomics data primarily relies on molecular information and fails to fully exploit the morphological features present in histology images, leading to compromised accuracy and interpretability. To overcome these limitations, we have developed a multi-stage statistical method called iIMPACT. It identifies and defines histology-based spatial domains based on AI-reconstructed histology images and spatial context of gene expression measurements, and detects domain-specific differentially expressed genes. Through multiple case studies, we demonstrate iIMPACT outperforms existing methods in accuracy and interpretability and provides insights into the cellular spatial organization and landscape of functional genes within spatial transcriptomics data. 
    more » « less
  7. ABSTRACT Advances in next‐generation sequencing technology have enabled the high‐throughput profiling of metagenomes and accelerated microbiome studies. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co‐occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essential to understanding the role of the microbiome in disease progression and susceptibility. Taxonomic abundance data generated from metagenomic sequencing technologies are high‐dimensional and compositional, suffering from uneven sampling depth, over‐dispersion, and zero‐inflation. These characteristics often challenge the reliability of the current methods for microbiome community detection. To study the microbiome co‐occurrence network and perform community detection, we propose a generalized Bayesian stochastic block model that is tailored for microbiome data analysis where the data are transformed using the recently developed modified centered‐log ratio transformation. Our model also allows us to leverage taxonomic tree information using a Markov random field prior. The model parameters are jointly inferred by using Markov chain Monte Carlo sampling techniques. Our simulation study showed that the proposed approach performs better than competing methods even when taxonomic tree information is non‐informative. We applied our approach to a real urinary microbiome dataset from postmenopausal women. To the best of our knowledge, this is the first time the urinary microbiome co‐occurrence network structure in postmenopausal women has been studied. In summary, this statistical methodology provides a new tool for facilitating advanced microbiome studies. 
    more » « less